Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

نویسندگان

چکیده

Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP), based on single trajectory Markovian samples induced by behavior policy. Focusing $\gamma $ -discounted MDP with state space notation="LaTeX">$\mathcal {S}$ and action {A}$ , we demonstrate that notation="LaTeX">$\ell _{\infty }$ -based sample complexity classical asynchronous — namely, number needed yield an entrywise notation="LaTeX">$\varepsilon -accurate estimate Q-function is at most order notation="LaTeX">$\frac {1}{ \mu _{\mathsf {min}}(1-\gamma)^{5}\varepsilon ^{2}}+ \frac { t_{\mathsf {mix}}}{ {min}}(1-\gamma)}$ up some logarithmic factor, provided proper constant learning rate adopted. Here, notation="LaTeX">$t_{\mathsf {mix}}$ notation="LaTeX">$\mu {min}}$ denote respectively mixing time minimum state-action occupancy probability trajectory. The first term this bound matches in synchronous case independent drawn from stationary distribution second reflects cost taken for empirical reach steady state, which incurred very beginning becomes amortized as algorithm runs. Encouragingly, above improves upon state-of-the-art result factor least notation="LaTeX">$|\mathcal {S}||\mathcal {A}|$ all scenarios, {mix}}|\mathcal any sufficiently small accuracy level . Further, scaling effective horizon {1}{1-\gamma can be improved means variance reduction.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

the effect of task complexity on lexical complexity and grammatical accuracy of efl learners’ argumentative writing

بر اساس فرضیه شناخت رابینسون (2001 و 2003 و 2005) و مدل ظرفیت توجه محدود اسکهان (1998)، این تحقیق تاثیر پیچیدگی تکلیف را بر پیچیدگی واژگان و صحت گرامری نوشتار مباحثه ای 60 نفر از دانشجویان زبان انگلیسی بررسی کرد. میزان پیچیدگی تکلیف از طریق فاکتورهای پراکندگی-منابع تعیین شد. همه ی شرکت کنندگان به صورت نیمه تصادفی به یکی از سه گروه: (1) گروه موضوع، (2) گروه موضوع + اندیشه و (3) گروه موضوع + اندی...

15 صفحه اول

Variance and sample size calculations in quality-of-life--adjusted survival analysis (Q-TWiST).

The Quality-Adjusted Time Without Symptoms or Toxicity (Q-TWiST) statistic previously introduced by Glasziou, Simes and Gelber (1990, Statistics in Medicine 9, 1259-1276) combines toxicity, disease-free survival, and overall survival information in assessing the impact of treatments on the lives of patients. This methodology has received positive reviews from clinicians as intuitive and useful,...

متن کامل

Parallel Asynchronous Stochastic Variance Reduction for Nonconvex Optimization

Nowadays, asynchronous parallel algorithms have received much attention in the optimization field due to the crucial demands for modern large-scale optimization problems. However, most asynchronous algorithms focus on convex problems. Analysis on nonconvex problems is lacking. For the Asynchronous Stochastic Descent (ASGD) algorithm, the best result from (Lian et al., 2015) can only achieve an ...

متن کامل

Asynchronous Doubly Stochastic Proximal Optimization with Variance Reduction

In the big data era, both of the sample size and dimension could be huge at the same time. Asynchronous parallel technology was recently proposed to handle the big data. Specifically, asynchronous stochastic (variance reduction) gradient descent algorithms were recently proposed to scale the sample size, and asynchronous stochastic coordinate descent algorithms were proposed to scale the dimens...

متن کامل

Zeroth-order Asynchronous Doubly Stochastic Algorithm with Variance Reduction

Zeroth-order (derivative-free) optimization attracts a lot of attention in machine learning, because explicit gradient calculations may be computationally expensive or infeasible. To handle large scale problems both in volume and dimension, recently asynchronous doubly stochastic zeroth-order algorithms were proposed. The convergence rate of existing asynchronous doubly stochastic zeroth order ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Information Theory

سال: 2022

ISSN: ['0018-9448', '1557-9654']

DOI: https://doi.org/10.1109/tit.2021.3120096